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Abstract 

This study proposes p-th Tobit quantile regression models with endogenous variables. In the first 
stage regression of the endogenous variable on the exogenous variables, the assumption that the a-th 
quantile of the error term is zero is introduced. Then, the residual of this regression model is included in 
the p-th quantile regression model in such a way that the p-th conditional quantile of the new error term 
is zero. The error distribution of the first stage regression is modelled around the zero a-th quantile as¬ 
sumption by using parametric and semiparametric approaches. Since the value of a is a priori unknown, 
it is treated as an additional parameter and is estimated from the data. The proposed models are then 
demonstrated by using simulated data and real data on the labour supply of married women. 

Keywords: asymmetric Laplace distribution; Bayesian Tobit quantile regression; Dirichlet process mix¬ 
ture; endogenous variable; Markov chain Monte Carlo; skew normal distribution; 


1 Introduction 

Since the seminal work of Koenker and Bassett (1978), quantile regression has received substantial scholarly 
attention as an important alternative to conventional mean regression. Indeed, there now exists a large 
literature on the theory of quantile regression (see, for example, Koenker (2005), Yu et al. (2003), and 
Buchinsky (1998) for an overview). Notably, quantile regression can be used to analyse the relationship 
between the conditional quantiles of the response distribution and a set of regressors, while conventional 
mean regression only examines the relationship between the conditional mean of the response distribution 
and the regressors. 

Quantile regression can thus be used to analyse data that include censored responses. Powell (1984; 
1986) proposed a Tobit quantile regression (TQR) model utilising the equivariance of quantiles under mono¬ 
tone transformations. Hahn (1995), Buchinsky and Hahn (1998), Bilias et al. (2000), Chemozhukov and 
Hong (2002), and Tang et al. (2012) considered alternative approaches to estimate TQR. More recent works 
in the area of censored quantile regression include Wang and Wang (2009) for random censoring using lo¬ 
cally weighted censored quantile regression, Wang and Fygenson (2009) for longitudinal data, Chen (2010) 
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and Lin et al. (2012) for doubly censored data using the maximum score estimator and weighted quantile 
regression, respectively, and Xie et al. (2015) for varying coefficient models. 

In the Bayesian framework, Yu and Stander (2007) considered TQR by extending the Bayesian quan¬ 
tile regression model of Yu and Moyeed (2001) and proposed an estimation method based on Markov 
chain Monte Carlo (MCMC). A more efficient Gibbs sampler for the TQR model was then proposed by 
Kozumi and Kobayashi (2011). Further extensions of Bayesian TQR have also been considered. Kottas and 
Krnjajic (2009) and Taddy and Kottas (2012) examined semiparametric and nonparametric models using 
Dirichlet process mixture models. Reich and Smith (2013) considered a semiparametric censored quan¬ 
tile regression model where the quantile process is represented by a linear combination of basis functions. 
To accommodate nonlinearity in data, Zhao and Lian (2015) proposed a single-index model for Bayesian 
TQR. Furthermore, Kobayashi and Kozumi (2012) proposed a model for censored dynamic panel data. 
For variable selection in Bayesian TQR, Ji et al. (2012) applied the stochastic search. Alhamzawi and 
Yu (2014) considered a g-prior distribution with a ridge parameter that depends on the quantile level, and 
Alhamzawi (2014) employed the elastic net. 

As in the case of ordinary least squares, standard quantile regression estimators are biased when one 
or more regressors arc correlated with the error term. Many authors have analysed quantile regression 
for uncensored response variables with endogenous regressors, such as Amemiya (1982), Powell (1983), 
Abadie et al. (2002), Kim and Muller (2004), Ma and Koenker (2006), Chernozhukov and Hansen (2005; 
2006; 2008), and Lee (2007). 

Extending the quantile regression model to simultaneously account for censored response variables and 
endogenous variables is a challenging issue. In the case of the conventional Tobit model with endogenous 
regressors, a number of studies were published in the 1970s and 1980s, such as Nelson and Olsen (1978), 
Amemiya (1979), Heckman (1978), and Smith and Blundell (1986), with more efficient estimators proposed 
by Newey (1987) and Blundell and Smith (1989). On the contrary, few studies have estimated censored 
quantile regression with endogenous regressors. While Blundell and Powell (2007) introduced control vari¬ 
ables as in Lee (2007) to deal with the endogeneity in censored quantile regression, their estimation method 
involved a high dimensional nonparametric estimation and can be computationally cumbersome. Cher¬ 
nozhukov et al. (2014) also introduced control variables to account for endogeneity. They proposed using 
quantile regression and distribution regression (Chernozhukov et al., 2013) to construct the control variables 
and extended the estimation method of Chernozhukov and Hong (2002). 

In the Bayesian framework, mean regression models with endogenous variables have garnered a great 
deal of research attention from both the theoretical and the computational points of view (e.g. Rossi et al., 2005; 
Hoogerheide et al., 2007a, 2007b; Conely et al., 2008; Lopes and Poison, 2014). However, despite the grow¬ 
ing interest in and demand for Bayesian quantile regression, the literature on Bayesian quantile regression 
with endogenous variables remains sparse. Lancaster and Jun (2010) utilised the exponentially tilted empir¬ 
ical likelihood and employed the moment conditions used in Chernozhukov and Hansen (2006). In the spirit 
of Lee (2007), Ogasawara and Kobayashi (2015) employed a simple parametric model using two asym¬ 
metric Laplace distributions for panel quantile regression. However, these methods are only applicable to 
uncensored data. Furthermore, the model of Ogasawara and Kobayashi (2015) can be restrictive because 
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of the shape limitation of the asymmetric Laplace distribution, which can affect the estimates. Indeed, the 
modelling of the first stage error in this approach remains to be discussed. 

Based on the foregoing, this study proposes a flexible parametric Bayesian endogenous TQR model. 
The p-th quantile regression of interest is modelled parametrically following the usual Bayesian quantile re¬ 
gression approach. Following Lee (2007), we introduce a control variable such that the conditional quantile 
of the error term is corrected to be zero and the parameters are correctly estimated. As in the approach of 
Lee (2007), the a-th quantile of the error term in the regression of the endogenous variable on the exogenous 
variables, which is often called the first stage regression, is also assumed to be zero. 

We discuss the modelling approach for the first stage regression and consider a number of parametric 
and semiparametric models based on the extensions of Ogasawara and Kobayashi (2015). Specifically, 
following Wichitaksorn et al. (2014) and Naranjo et al. (2015), we employ the first stage regression models 
based on the asymmetric Laplace distribution, skew normal distribution, and asymmetric exponential power 
distribution, for which the cr-th quantile is always zero and is modelled by the regression function. To 
introduce more flexibility into the tail behaviour of the models based on the asymmetric Laplace and skew 
normal distributions, we also consider a semiparametric extension using the Dirichlet process mixture of 
scale parameters as in Kottas and Krnjajic (2011). The value of a is a priori unknown, while the choice of 
a can affect the estimates. In this study, hence, a is treated as a parameter to incorporate uncertainty and 
is estimated from the data. The performance of the proposed models is demonstrated in a simulation study 
under various settings, which is a novel contribution of the present study. We also illustrate the influence of 
the prior distributions on the posterior in the cases where valid and weak instruments are used. 

The rest of this paper is organised as follows. Section [2] introduces the standard Bayesian TQR model 
with a motivating example. Then, Section [3] proposes Bayesian TQR models to deal with the endogenous 
variables. The MCMC methods adopted to make inferences about the models are also described. The 
simulation study under various settings is presented in Section 0] The models are also illustrated by using 
the real data on the working hours of married women in Section [5] Finally, we conclude in Section [6] 

2 Bayesian TQR 

Suppose that the response variables are observed according to 

Vi = c (y *) = max (0, y*\ , i = l,...,n. 

Then, consider the p- th quantile regression model for y* given by 

V* = APp + i = 

where x, is the vector of regressors, (3 p is the coefficient parameter, and e t is the error term whose p- th 
quantile is zero. The p- th conditional quantile of y* is modelled as Q y *\ x (p) = x'/3 p . The equivariance 
under the monotone transformation c(-) of quantiles implies that the p-th conditional quantile of y is given 
by 

Qy\x.(p) c(Q.y*| x (p)). 
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The TQR model can be estimated by minimising the sum of asymmetrically weighted absolute errors 

n 

min E PpiVi ~ c(x'/3 p )), (1) 

Pp i =1 

where p p (u) = u(p — I(u < 0)) and /(•) denotes the indicator function (Powell, 1986). 

The Bayesian approach assumes that e follows the asymmetric Laplace distribution, since minimising 
© is equivalent to maximising the likelihood function of the asymmetric Laplace distribution (Koenker 
and Machado, 1999; Chernozhukov and Hong, 2003). The probability density function of the asymmetric 
Laplace distribution, denoted by AC(a,p), is given by 

, , , x P(l-P) / P P (e)\ . . 

f AL (e\a,p) = ---exp <—-oo < x < oo, (2) 

where a > 0 is the scale parameter and p S (0, f) is the shape parameter (Yu and Zhang, 2005). The 
mean and variance arc given by E[e] = rj^|—and Var(e) = rr 2 ' 2 '‘, . The p-th quantile of this 
distribution is zero, J° /(e) = p. Assuming the prior distributions for the parameters, the parameters 
are estimated by using the MCMC method (e.g. Yu and Stander, 2007; Kozumi and Kobayashi, 2011). 
Posterior consistency of Bayesian quantile regression based on the asymmetric Laplace distribution was 
shown by Sriram et al. (2013). 

Estimates under the standard Bayesian TQR model are biased when endogenous variables are included 
as regressors. Consider a simple motivating example where the dataset was generated from 

y* = /3o + Pi Xi + 5di + Ui, 

(3) 

(k = 70 + 7i x i + + Vi, 

fori = 1,... ,300, where (/3 0 ,/3i,5) = (1,1,1), ( 70 , 71 , 72 ) = (1,1,1), Xi,w t ~ A/"(0,1) and 


( ~Af(0,S), E = 

1 P 

V Vi J 

_ P 1 _ 


See also Chernozhukov et al. (2014). Note that p expresses the level of endogeneity. While d is an exogenous 
variable when p = 0, d is endogenous when p / 0. Since u\v ~ M(pv, 1 — p 2 ), the model can be rewritten 
as 

y* = Po + Pi Xi + ddt + pv + y/l- p 2 Ui. (4) 

Therefore, the standard model that models the conditional quantile of y* as Pq + Pix + 5d produces biased 
estimates. 

Figure [T] shows the posterior distributions of Pq, Pi, and 6 for the standard model for p = 0.5 obtained 
by using the method of Kozumi and Kobayashi (2011). The vertical lines in the figure indicate the true 
values. In the case of p = 0, the posterior distributions are concentrated around the true values. However, 
in the case of p = 0.6, the posterior distributions are concentrated away from the true values. 
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Figure 1: Posterior distributions of /3q, /3 \, and 5 using the standard Bayesian Tobit median regression 


3 Bayesian Endogenous TQR Model 

3.1 Model 

We propose the following model to deal with the endogenous variables: 

Vi = x.'iP p + 6 p di + rj p (di - z- 7 ) + e*, (5) 

di = z-7 + Vi, (6) 

for i = 1 ,n, where x, is the vector of the exogenous variables whose the first element is 1, di is the 
endogenous variable, z, = (x' ; . u;,)', and Wi is the exogenous variable not included in x$, which is also called 
the instrumental variable. The term di — z'7 = Vi in © is called the control variable and is introduced to 

account for endogeneity. Note that r/ p / 0 indicates di is endogenous. We refer to © as the first stage 

regression and to © as the second stage regression. A similar form is found in Lopes and Poison (2014) in 
the context of the instrumental variable regression for means by using the Cholesky-based prior. 

Following Lee (2007), the error term e, of the standard Bayesian TQR is decomposed into the terms 
Vp(di ~ z' 7 ) and e t . It is assumed that relationship © is specified correctly and the quantile independence 
of ei on z i conditional on v % : 


Qe\d,z(p) = Qe\v,z(p ) = Qe\v(p) = Vp( d ~ z T)- (7) 

As in Lee (2007), we also assume 

Q v \ z (a) = 0, (8) 

where the o-th conditional quantile of is zero for some a € (0,1). 

3.2 First Stage Regression 

We are mainly concerned with modelling the first stage error that satisfies ©. A simple and convenient 
approach is to assume Vi ~ A£(<j), a), i = 1..... n, as in Ogasawara and Kobayashi (2015), since © is 


5 











always satisfied for the asymmetric Laplace distribution. However, the asymmetric Laplace distribution has 
limitations, such as peaky density, restrictive tail behaviour, and skewness. When a model lacks fit to the 
data, the estimate of the conditional quantile would be away from the value such that © truly holds. Then, 
assuming n* is homoskedastic, the estimate of the intercept, 70 , may be biased as well. Consequently, the 
estimate of (3 p0 would be affected through the introduced term r) p (di — z' 7 ). When v t is heteroskedastic, 
the entire coefficient vector would be affected. Therefore, we consider some alternative models for the first 
stage eiTor distribution. 

Recently, Wichitaksorn et al. (2014) considered a class of parametric distributions with a quantile 
constraint of the form (©. including the asymmetric Laplace distribution, and applied them in the con¬ 
text of quantile modelling. Furthermore, Zhu and Zinde-Walsh (2009), Zhu and Galbraith (2011), and 
Naranjo et al. (2015) considered a flexible parametric distribution with the quantile constraint. Based on 
these studies, we also consider the following two distributions to model the first stage error. 

First, we consider the skew normal distribution denoted by SJ\f(4>, a), where c 6 > 0 is the scale param¬ 
eter and a € (0,1) is the shape parameter. The probability density function is given by 

fsN(v\(j>,a) = —exp | — ^4(a - I{v < 0)) 2 | . (9) 


When a = 0.5, the distribution reduces to AC(0, 7 ). The mean and variance are given by E[v\ = y £= 
and Var(n) = ^ ^( 1 ~^ " 2 ~^ (see Wichitaksorn et al., 2014). When the actual error distribution 
is close to the normal distribution, this distribution would lead to better performance than the asymmetric 
Laplace distribution. However, just as the asymmetric Laplace distribution, the skewness and the quantile 
level of the mode are controlled by the single parameter a. 

Second, we consider the asymmetric exponential power distribution treated by Zhu and Zinde-Walsh (2009), 
Zhu and Galbraith (2011), and Naranjo et al. (2015). The probability density function of the asymmetric 
exponential power distribution, denoted by A£V(4>, a, Cl j C 2 )> is given by 

CL 


fAEp(v\(/>,a, Ci, C 2 ) = < 



r 

^ ex p < 

_ 

i ex p< 

_ 


oWr(i+i/Ci) 


(i-«)7/r(i+i/c 2 ) 


C2 


if 

if 


v < 0 , 
v > 0 , 


( 10 ) 


where <j> > 0 is the scale parameter, a E (0,1) is the skewness parameter, Ci > 0 is the shape parameter for 
the left tail, and C 2 > 0 is the shape parameter for the right tail. After some reparameterisation, the distribu¬ 
tion reduces to the asymmetric Laplace distribution when Ci = C 2 = 1 and to the skew normal distribution 
when Ci = C 2 = 2 . The tails of the asymmetric exponential power distribution are controlled separately by 
Ci and C 2 > respectively, and the overall skewness is controlled by a. Although the distribution is more flexi¬ 
ble than the above two distributions, the posterior computation using MCMC would be inefficient, because 
it includes two additional shape parameters and it has no convenient mixture representation, apart from 
the mixture of uniforms that is inefficient, to facilitate an efficient MCMC algorithm. The computational 
efficiency is also compared in Section [4] 

In addition to the three parametric models, we also consider the semiparametric extension of the models 
based on the asymmetric Laplace and skew normal distributions to achieve both flexibility and computa- 
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tional efficiency. More specifically, the following two models using the Dirichlet process mixtures of scales 
arc considered: 


fALDp{v\G) = f f A L(v\(j),a)dG((j)), G~W(a,G 0 ), (11) 

fsNDp(v\G) = //sw(r#, a)dG ((/>), G ~ T>V(a,G 0 ), (12) 

where VV(a , Go) denotes the Dirichlet process with the precision parameter a > 0 and the base measure 
Go- For both models, we set Go = TQ(cQ,do) as it is computationally convenient. While those mixture 
models have the same limitation as the parametric versions in terms of skewness, they extend the tail be¬ 
haviour of the error distribution preserving © (Kottas and Krnjajic, 2009). Hereafter, the models with the 
asymmetric Laplace, skew normal, and asymmetric exponential power first stage errors are respectively 
denoted by AL, SN, and AEP, and those with the Dirichlet process mixtures are denoted by ALDP and 
SNDP 

We must take care when selecting the a value in ©, as it is a part of the model specification and 
can thus affect the estimates (Lee, 2007). We treat cr as a parameter and estimate its value along with the 
other parameters. Since a determines the quantile level of the mode for all models considered here, our 
approach to modelling the first stage regression can also be regarded as a kind of mode regression (see 
Wichitaksorn et al., 2014). 

To gain further flexibility, we might extend the model through a fully nonparametric mixture. Several 
semiparametric models in the context of Bayesian quantile regression with exogenous variables have been 
proposed by Kottas and Gelfand (2001), Kottas and Krnjajic (2009), and Reich et al. (2010). For example, 
Kottas and Krnjajic (2009) considered the nonparametric mixture of uniform distributions for any unimodal 
density on the real line with the quantile restriction at the mode using the Dirichlet process mixture (see also 
Kottas and Gelfand, 2001). In the more flexible model proposed by Reich et al. (2010), the mode of the 
error distribution does not have to coincide with zero. This is achieved by using a nonparametric mixture 
of the quantile-restricted two-component mixtures of normal distributions. However, their approaches are 
not directly applicable in the present context where the value of a is estimated. If we were to estimate the 
quantile level for which the quantile restriction holds, the computation under the former model is expected 
to be extremely inefficient and unstable as the model involves many indicator functions, and a and the 
intercept would be highly correlated. The intercept would not be identifiable in the latter model. 

We could further extend the model to account for heteroskedasticity such that 

di = z-7 + z [nvi, (13) 

for * = 1,..., n, where z'k. > 0 for all i and the first element of k is fixed to one ( e.g . Reich, 2010). In 
this case, the a-th quantile of d is given by Qd\ z ( a ) = z *7 + z£fcQ„| z( a ) = z i(7 + K Qv\z ( a )) as i n the 
usual quantile regression. However, since the first stage regression model is built based on ©, models © 
and (fT3l) would produce identical estimates. 
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3.3 Second Stage Regression 

We next turn to the model of the new second stage error, e*, in ([5]). Since the p -th conditional quantile 
of ei is now zero, we assume that e* ~ A£(cr,p), i = 1,... , n, as in the standard Bayesian quantile 
regression approach. We utilise the location scale mixture of normals representation for the asymmetric 
Laplace distribution to facilitate an efficient MCMC method following Kozumi and Kobayashi (2011) (see 
also Kotz et al., 2001). The model is expressed in the hierarchical form given by 

lH = max {?/*, 0 } , 

y* ~ A/"(x ifip + 0 p gt , r 2 agt), 

9i ~ £{&), 

for * = 1,, n, where x* = (x(, di, d\ — z' 7 )', 0 p = ((3' p , S p , g p )', £(a) denotes the exponential distribu¬ 
tion with mean a, and 

1 - 2p o 2 

° U p(l-p)’ Tp p(l-p)‘ (14) 


3.4 Prior Distributions 

The coefficient parameter 7 is common to all first stage regression specifications. First, we assume the 
normal prior for 7 , since it is computationally convenient for the AL, SN, ALDP, and SNDP models. Since 
we do not have information on the coefficient values, the variances are set such that the prior distributions 
are relatively diffuse. Our default choice is 7 ~ A ,r (0.1001). For the scale parameters, </> for the AL, SN, 
and AEP distributions, a relatively diffuse inverse gamma distribution is assumed and the default choice is 
set to Z(/(0.1, 0.1). For AEP, we assume Cj ~ TA//o j 00 )(l, 1), where TJ\f (ajqiji. er 2 ) denotes the normal 
distribution with the mean // and variance a 2 truncated on the interval (a, b). A similar prior specification is 
found in Naranjo et al. (2015). For all models, a ~ U(0, 1) is assumed. 

For the semiparametric models, we need to specify the parameters of the inverse gamma base measure. 
Assuming that the data have been rescaled, co and do are chosen such that the variance of v % takes values 
between 0 and 3 with high probability ( e.g. Ishwaran and James, 2002). Our default choice is co = 2 
and do = 0.5 for ALDP and cq = do = 1.5 for SNDP. Under this choice, when a = 0.5 for ALDP, 
Pr (fa < ^32)78) = 0.802 as Var(ni) = 8 cj) 2 . Similarly, when cr = 0.4, Pr(^ < ^3.0/0.332) = 0.784. 
For SNDP, Pr(</>/ < 3) = 0.801 when a = 0.5 and Pr(0 Z < 3/1.104) = 0.775 when a = 0.4. For the 
precision parameter of the Dirichlet process, a, we assume a ~ <7(2, 2) such that both small and large values 
for a, hence the number of clusters, are allowed. 

For the coefficient parameters in the second stage, /3 p and 5 P , we also assume relatively diffuse normal 
distributions. Our default choice of prior is {f3' p ,g p )' ~ tV(0, 1001). Similar to cj) in the parametric first 
stage, we assume an inverse gamma prior for the scale of the AL pseudo likelihood. Our default choice is 

xg( 0.1,0.1). 

The parameter r) p accounts for the endogeneity and we need to take care in prior elicitation. When the 
data follow the bivariate normal distribution, as in the motivating example ( 01 , q p is equal to pa\ jo 2 , where 






p is the correlation coefficient and a\ and oi are the standard deviations of the first and second stage errors, 
respectively. In this case, we may follow Lopes and Poison (2014) to determine the variance of the normal 
prior implied from an inverse Wishart prior for the covariance matrix. However, we do not limit ourselves to 
normal data as the quantile regression approach is suitable for heteroskedastic and non-normal data, and the 
non-normal models are used in the first stage. In the literature on Bayesian non-normal selection models, 
the prior distribution of rj p is normal typically with a very small variance, such as 1/2 (e.g. Munkin and 
Trivedi, 2003, 2008; Deb et al., 2006). On the other hand, we use a more diffused prior to reflect our 
ignorance about rj p and set our default choice of prior to be rj p ~ A r ((). 5). When the instrument is weak, 
it is expected that our quantile regression models face the problem of prior sensitivity and that the posterior 
distributions exhibit sharp behaviour, as in the case of the Bayesian instrumental variable regression model. 
Section 0] considers the alternative choices of the hyperparameters to study the prior sensitivity. 

3.5 MCMC Method 

The proposed models are estimated by using the MCMC method based on the Gibbs sampler. We describe 
the Gibbs sampler for the semiparametric models with ALDP and SNDP, which is an extension of the Gibbs 
sampler described in Kozumi and Kobayashi (2011) and Ogasawara and Kobayashi (2015). The algorithms 
for the AL and SN models can be obtained straightforwardly. We also mention the algorithm for the AEP 
model. 

The variables involved in the Dirichlet process are sampled by using the retrospective sampler (Pa- 
paspiliopoulos and Roberts, 2008) and the slice sampler (Walker, 2007). First, we introduce m ~ 77(0,1) 
and ki , i = 1,..., n, such that i q = Pr (hi = l),l = 1,..., oo. Then, as in Walker (2007), the Gibbs 
sampler is constructed by working on the following joint densities 

OO 

fALDp(Vi,Ui ) = < UJi)fAL(Vi\(j)l,a), 

1=1 

oo 

fsNDp(vi,Ui ) = ^/(rq < ui)fsN(vi\<pi,ci), 
i=i 

where 4>i Go, 7 Ti = up nz<r'( 1 - Ur), W l 73(1, o), and B(a,b) denotes the beta distribution with the 
parameters a and b (Sethuraman, 1994). We also let k* denote the minimum integer such that Y2i=i 7r / > 
1 - min {up.. . ,u n }. 

Algorithm for ALDP 

For the AFDP model, we utilise the mixture representation for the asymmetric Faplace distribution to sample 
7 efficiently such that Vi\hi ~ Af (9 a h t . Tgd-jhj), hi ~ £(4>i), i = 1 ,n, where 9 a and r/ are defined 
as in m. Let us denote /3 p = (/!/. <) p . : rj p )' and x, = (x(, v t — z^)'. Our Gibbs sampler proceeds by 
alternately sampling {rq}/ =1 , { ^i}T=i, {<&}?!v °> 7, {MiLp {v*}i=v Pp> cr > and {&}?= p 

• Sampling {'«?;}/ =1 : Generate u t from 77(0, tt/. j ) for i = 1, n. 
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• Sampling {cu;}f =1 : Generate w; from B{ 1 + ni,n — X]r<z n r + a ) where n; = ^)” = 1 1(ki = Z) for 
l = l,...,k*. 

• Sampling {/cj}” =1 : Generate from the multinomial distribution with probabilities 

Pr(fci = Z) oc fAL{di - •z! i 'y\4) h a)I{u i < 7 p), 1 = 1, 


for * = 1 ,, n. 

Sampling { 4>i } f_ x : Generate (fit from lQ(ci . d[ ) where 


q = 1.5rp + c 0 , di = y, hi + 

i:ki=l L 


(di - z '7 - 6 > a /p) ; 


+ do- 


Sampling a: Assuming the gamma prior, G(ao,bo), we use the method described by Escobar and 
West (1995) to sample a. By introducing c ~ B(a + 1, n), the full conditional distribution of a is the 
mixture of two gamma distributions given by 

<yA?(a 0 + n*,b 0 - logc) + (1 - ip)G(ao + n* - l,b 0 - logc), 

where n* is the number of distinct clusters and ip/(l — tp) = (ao + n* — l)/(n(bo — log c)). 

Sampling 7 : Assuming 7 ~ A^(go, Go), 7 is sampled from Af(gi, Gi) where 

-1 


Gi = 


E 

U =1 


z i 


dp + 1 


rfagi T%<t) ki hi 


z i + G 0 1 


gi = Gi 


E 


,i =1 


Z i - 


Vp(y* - *iPp - dpdj - 0 p gi) dj - e Q hj 


T p (j g i 


T a ( t>kihi 

as the density of the full conditional distribution denoted by 77 ( 7 1—) is given by 

r ” (V* ~ APp ~ dpdi - rj p (di - z' 7 ) - 9 p gi) 2 


+ G 0 x g 0 


tt( 7 |-) oc exp < — ^ 


i= 1 


exp \ - ]T 


(■di - z);7) 2 

^ ^aPkihi 


a gt 


ex P \ -^(7 - go)'G 0 *(7 - g 0 ) 


oc exp<| --( 7 -gt)'G 1 1 ( 7 -gi; 


Sampling {/ij}” =1 : The full conditional distribution of h t is the generalised inverse Gaussian distri¬ 
bution, denoted by QTQ(v, £, x). The probability density function of QTQ(y, £, x) is given by 


fix\v,£,x) = 


2 AE(^x)' 


x 1 " 1 exp <—(Z; x 1 + x^az) f , x > 0, —oo<u<oo, £,x>0, 


where K u (-) is the modified Bessel function of the third kind (Barndorff-Nielsen and Shephard, 2001). 
For i = I, n, we sample hi from QTQ(l/2, G, Xi) where 


pi = (■ d i - z b ) 2 2 = 
T i(pki 


el 


+ 


T a ( t ) ki Pki 
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• Sampling a: The density of the full conditional distribution of a is given by 

n 

7r(a|—) oc Tr(a)Y\fAL(di - zGy| <j) ki ,a), 

2=1 

where 7 r(a|—) andTrfoj denote the full conditional and prior density of a, respectively. We use the 
random walk Metropolis-Hastings (MH) algorithm to sample from this distribution. 

• Sampling {y* }- l =1 : The full conditional distribution of y* is given by 

yiliyi > 0 ) +TJ'J’(- oofi )(x , iP p + 6 P gi,Tpagi)I(yi = 0 ), i = l,...,n. 


Sampling (3 p : We sample f3 p = (/ 3' p ,5 p ,r] p )' in one block. Assuming (3 p ~ A r (b ( ). B 0 ), the full 
conditional distribution is given by J\T{hi, Bi) where 


Bi = 


i -i 


E 


X.;X • 


+ B r 


—J TpCrgi 


bi = Bi 


My* - 0p9i) 


. 2=1 


T^crgi 


+ B 0 bo 


• Sampling a: Assuming a ~ IQ {mo, sq), we sample cr from IQ {mi, si) where rn \ = 1.5n + mo 
and si = Ya= i 9i + E"=i (Vi ~ k iPp ~ e p9i) 2 / 2r p9i + - s o- 


• Sampling {c^}” =1 : Similar to hi, g t is sampled from QXQ{ 1/2, A i,ip) where 


K = 


(:yi - x-'Ap ) 2 


1 p w 


,2 6 l 2 

w = H—, i = 1,... ,n. 

T p a ° 


Algorithm for SNDP 

The Gibbs sampler for SNDP consists of sampling }” =1 , {^i}\ c =1 , {</>/}f =1 , a, 7 , a, {y *}” =1 , 

ftp, <?, and {pi}” =1 . The sampling algorithms for {«j}” =1 , {wz}fl ls a, {y*}” =1 , /3 p , a, and {gj}” =1 remain 
the same as in the case of ALDP. The sampling scheme of {/c ,;}" =1 and a can be obtained by replacing 
f A L{di - z- 7 | (j) ki ,a) with f S N{di - z- 7 | (j) ki ,a). 

Si mi lar to the case of ALDP, the density of the full conditional distribution is given by 


tt(7|-) oc exp | -^(7 - gt(7))'Gi(7) 1 {l ~ gi(7)) 


where 

Gi( 7 ) 

gt(7) 


l_2=l 

Gi( 7 ) 


9p . 4(a — I{di < z' 7)) 2 

o I 


< + Go 1 


-1 


TpVgi ftki 

Vp(y* - x'iftp - Vpdi ~ 0 p gi) 4di{a- I{dk < z' 7)) 2 


E 

2=1 


Zi - 


■Tpvgi 


+ 


ftki 


+ G 0 1 go 


which is si mi lar to the density of the normal distribution. Therefore, we sample 7 by using the MH algorithm 
with the proposal distribution given by Af{gi ( 7 ), Gi ( 7 )). 
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Algorithm for AEP 

Since no convenient representation for the AEP distribution is available, the full conditional distributions of 
the parameters in the first stage regression, 7 , cj), a, Ci, and ( 2 , are not in the standard forms. Therefore, we 
employ the adaptive random walk MH algorithm. Although Naranjo et al. (2015) proposed the scale mixture 
of uniform representation for the AEP distribution, the algorithm based on this representation would be 
inefficient, because it consists of sampling from a series of distributions that are truncated on some intervals 
such that the mixture representation holds and such intervals move quite slowly as sampling proceeds (see 
also Kobayashi, 2015). Since the additional shape parameters in AEP free up the role of a, a controls 
the overall skewness by allocating the weights on the left and right sides of the mode. Hence, the MCMC 
sample would exhibit relatively high correlation between a and 70. 

4 Simulation Study 

The models considered in the previous section are demonstrated using simulated data. The aims of this 
section are (1) to compare the performance of the proposed models (Section l4~2l ). (2) to study the sensitivity 
to the prior settings, and (3) to illustrate the behaviour of the posterior distribution when the instrument is 
weak (Section 1431) . 

4.1 Settings 

The data are generated from the model given by 

y* = Po + Pi Xi + 5di + r]Vi + ei, 

( 15 ) 

di = 70 + 71 Xi + 72 Wi + Vi, 

for i = 1 ,..., 300, where ( 70 , 71 , 72 ) = ( 0 , 1 ,1.5) assuming that a valid instrument is available, (Po, Pi, 8, rj) 
( 0 , 1 , 1 , 0 . 6 ), Xi ~ A7(0, 1 ), and Wi ~ TN (o,oo)(l> !)• The performance of the models is compared by con¬ 
sidering the various settings for Vi, while the distributions of 7 are kept relatively simple in order that the 
true values of the quantile regression coefficients are tractable. The following five settings are considered: 

Setting 1 Vi ~ A7(0,1), e* ~ A/"(0,1 — rj 2 ). 

Setting 2 Vi ~ £ 4 , e* ~ to, 

Setting 3 Vi ~ ST(— 0.430, 1 , 0.980,4), e* ~ to, 

Setting 4 Vi ~ A/"(0, (1 + 0.5 Wi) 2 ), ei ~ jV(0,1 — rj 2 ), 

Setting 5 Vi ^ ST{— 0.430, (1 + 0.5tCj) 2 , 0.980,4), e t ~ t%, 

where ST(/x,a 2 ,a,u) denotes the skew t distribution with the location parameter //, scale parameter <r 2 , 
skewness parameter a = 8/y/l — S 2 , 5 € (—1,1), and degree of freedom v (see Azzalini and Capi- 
tanio, 2003; Friihwirth-Schnatter and Pyne, 2010), and we set 6 = 0.7. In Setting 1, the error terms follow 
the bivariate normal distribution as in the motivating example in Section [2] Setting 2 considers the fat tailed 
first stage regression. Setting 3 considers a more difficult situation where the first stage error is fat tailed 
and skewed. Setting 4 replaces the first stage error of Setting 1 with the heteroskedastic error with respect 
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to the instrument. Setting 5 is also a challenging situation where the first stage error is fat tailed, skewed, 
and heteroskedastic. In Settings 3 and 5, the location parameters of the first stage error distributions are set 
such that the mode of Vi is zero and the quantile level of the mode is 0.435. The average censoring rates for 
the settings are around 0.25. For each setting, the data are replicated 100 times. 

4.2 Results under the Default Priors 

We first estimated the proposed models under the default prior specifications (see Section lT4l ) for p = 0.1 
and 0.5 by running the MCMC for 20000 iterations and discarding the first 5000 draws as the burn-in period. 
The standard Bayesian TQR model was also estimated. The bias and root mean squared error (RMSE) of 
the parameters were computed over the 100 replications. To assess the efficiency of the MCMC algorithm, 
we also recorded the inefficiency factor, which was defined as a ratio of the numerical variance of the sample 
mean of the Markov chain to the variance of the independence draws (Chib, 2001). 

Table Q] presents the biases, RMSEs, and median inefficiency factors for the parameters over the 100 
replications. First, we examined the inefficiency factors. Overall, our sampling algorithms appear to be 
efficient, especially for AL, SN, ALDR and SNDR The table shows that the inefficiency factors for AL, SN, 
ALDR and SNDP are reasonably small for ( 3 p i, 5 P , r) p , 71 , and 72 . Since a and 70 determine the quantile 
level of the mode and location of the mode, respectively, the MCMC sample exhibits correlation between a 
and 70 and this results in higher inefficiency factors for them. Flence, the inefficiency factors for 8 p o tend 
to be higher than those for the other parameters. This pattern is more profound in the case of AEP where 
the inefficiency factors for a, 70, and 8 p 0 are quite high. Since the additional shape parameters in AEP 
free up the role of a, the MCMC sample exhibits higher correlation between a and 70. Furthermore, the 
inefficiency factors for the other parameters for AEP are also higher than those for the other endogenous 
models. 

Next, we turn to the performance of the models. As expected, TQR produces biased estimates in all 
cases. The RMSEs for the proposed endogenous models are generally larger for p = 0.1, which is below 
the censoring point, than for p = 0.5. The AL and ALDP models result in similar performance. The 
AEP model shows the largest RMSEs for 70 and (3 p 0 among the proposed models for all cases. Combined 
with the high inefficiency factors for those parameters, the convergence of the MCMC algorithm for AEP 
may be difficult to ensure in the given simulation setting. This finding suggests a considerable practical 
limitation and, thus, AEP will not be considered henceforth. The same limitation applies to the potentially 
more flexible nonparametric models discussed in Section [3721 

TableQ]also shows that the estimation of the first stage regression can influence the second stage param¬ 
eters. For example, in Setting 1, the RMSEs for 70 for SN and SNDP are smaller than those for AL and 
ALDP, as the true model is the normal and thus SN and SNDP produce smaller RMSEs for 8 P o- Similarly, 
in Setting 4, the RMSEs for 8 p 0 for SN and SNDP are smaller than those for AL and ALDP In addition, the 
heteroskedasticity in the first stage influences the performance of the slope parameters, resulting in slightly 
smaller RMSEs for f3 P 1 f° r SN and SNDP than for AL and ALDP. However, the performance of the SN 
model becomes worse when the first stage error is fat tailed, since the skew normal distribution cannot 
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accommodate a fat tailed distribution. While the results in Setting 2 are somewhat comparable across the 
models, the table shows that SN results in larger biases and RMSEs in Setting 3 and, especially, Setting 5. 
In Setting 3, SN results in larger RMSEs for (3 P o than for AL, ALDR and SNDR In Setting 5, given the 
heteroskedasticity of the first stage, the biases and RMSEs for the intercept and slope parameters for SN 
are larger than those for AL, ALDP, and SNDR On the other hand, compared with SN, the semiparametric 
SNDP model is able to cope with fat tailed errors and this produces results comparable with those for AL 
and ALDR 

While the models result in reasonable overall performance, the results for Settings 3 and 5 also illustrate 
the limitation of our modelling approach to some extent. In Setting 3, the models exhibit some bias in f3 p o 
because of the lack of fit in the first stage. This lack of fit, which is represented by the bias for 70 , is reflected 
in the bias for (3 P 0 . The entire coefficient vector may be influenced by this lack of fit in the first stage in 
the presence of heteroskedasticity as in Setting 5. The lack of fit in the first stage is also indicated by the 
biases in a. This finding implies that an inflexible first stage model can fail to estimate the true quantile such 
that ([HJ holds and that choosing the value of cr a priori could lead to biased estimates (see the discussion in 
Section |T2T ). 
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Table 1: Biases, RMSEs, and inefficiency factors under the default priors 

TQR AL SN AEP ALDP SNDP 


Setting p Parameter Bias RMSE IF Bias RMSE IF Bias RMSE IF Bias RMSE IF Bias RMSE IF Bias RMSE IF 


I (IT -0.474 0.511 37.1 0.048 0.239 55.5 0.047 

flpl -0.248 0.272 17.0 -0.022 0.139 22.3 -0.020 

d p 0.200 0.212 27.4 -0.009 0.092 24.9 -0.007 
t] p 0.001 0.122 18.5 -0.001 

70 0.001 0.204 54.7 0.000 

71 -0.012 0.066 16.7 -0.006 

72 -0.004 0.086 17.3 -0.002 

a -0.002 0.052 66.1 -0.001 


0.211 59.7 0.066 0.252 245.0 0.047 0.237 57.1 0.047 0.209 61.8 

0.134 18.9 -0.020 0.136 43.1 -0.022 0.140 24.7 -0.020 0.135 20.1 

0.085 20.0 -0.007 0.085 46.3 -0.008 0.092 24.6 -0.007 0.085 21.1 

0.120 14.5 -0.001 0.120 29.8 0.000 0.122 17.5 -0.001 0.120 16.1 

0.165 59.2 0.036 0.264 340.8 0.000 0.206 53.2 0.003 0.160 60.7 

0.058 9.2 -0.007 0.060 96.3 -0.012 0.067 17.7 -0.007 0.059 9.6 

0.074 9.2 -0.002 0.075 93.7 -0.004 0.086 16.9 -0.003 0.074 8.9 

0.043 72.2 0.011 0.089 357.1 -0.002 0.054 65.0 -0.000 0.042 76.4 



PpO 

-0.754 

0.768 

9.8 -0.042 

0.201 

16.0 -0.088 

0.257 

ft pi 

-0.394 

0.406 

7.0 0.052 

0.148 

15.2 0.068 

0.181 

Op 

0.430 

0.432 

7.6 -0.042 

0.102 

15.1 -0.059 

0.124 

Vp 



0.042 

0.115 

13.5 0.059 

0.134 

70 



-0.093 

0.239 

18.7 -0.158 

0.361 

7i 



0.002 

0.088 

12.6 -0.000 

0.127 

72 



-0.058 

0.162 

19.3 -0.081 

0.203 

a 



-0.042 

0.062 

38.0 -0.057 

0.094 


21.8 0.093 0.333 123.5 -0.043 0.206 17.3 -0.061 0.200 21.5 

13.9 0.049 0.147 35.1 0.052 0.148 14.4 0.052 0.154 12.1 

15.0 -0.040 0.100 41.7 -0.042 0.101 16.2 -0.045 0.105 14.0 

15.2 0.040 0.112 33.1 0.042 0.115 15.3 0.045 0.117 13.2 

19.9 0.113 0.420 177.0 -0.095 0.247 21.4 -0.122 0.243 26.9 

9.5 0.000 0.086 59.9 0.001 0.088 14.4 -0.002 0.090 8.8 

14.7 -0.052 0.155 75.0 -0.058 0.163 21.9 -0.064 0.161 16.3 

42.0 0.018 0.104 209.2 -0.043 0.063 44.4 -0.049 0.067 54.2 
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4.3 Alternative Base Measures and Prior Specifications 

For comparison puiposes, we consider two alternative specifications for the inverse gamma base measure for 
the semiparametric models. The following slightly less diffuse settings than the default are considered. For 
ALDP, we consider ZC?(2.5,0.6) such that Pr(<^ < a/ 3/8) = 0.854 and Z^(3.0,0.7) such that Pr {<f>i < 
-/3/8) = 0.891 when a = 0.5. For SNDP, we consider ZQ{ 2,2) such that Pr(^> < 3) = 0.852 and 
ZQ{ 2.5, 2.5) such that Pr( 0 / < 3) = 0.893. For the other parameters, we use the default prior specifications. 
Table |2]pi'esents the biases and RMSEs for ALDP and SNDP under the alternative base measures for p = 0.1 
and p = 0.5. The results in Table |2] are essentially identical to those in Table [I] suggesting that the default 
choice of the base measures provides reasonable performance. 


Table 2: Biases and RMSEs for ALDP and SNDP under the alternative base measures 






ALDP 



SNDP 





Alternative 1 

Alternative 2 

Alternative 1 

Alternative 2 

Setting 

p 

Parameter 

Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 

1 

0.1 

(3p0 

0.050 

0.239 

0.048 

0.239 

0.050 

0.211 

0.047 

0.209 



Ppi 

-0.022 

0.140 

-0.022 

0.139 

-0.020 

0.136 

-0.020 

0.135 



5p 

-0.008 

0.092 

-0.009 

0.092 

-0.008 

0.085 

-0.007 

0.086 


0.5 

flpO 

0.018 

0.182 

0.017 

0.183 

0.017 

0.164 

0.019 

0.164 



Ppi 

-0.002 

0.089 

-0.001 

0.089 

0.001 

0.087 

0.001 

0.087 



5p 

-0.004 

0.063 

-0.004 

0.063 

-0.003 

0.061 

-0.003 

0.061 

2 

0.1 

PpO 

0.092 

0.304 

0.088 

0.304 

0.091 

0.302 

0.094 

0.302 



Ppi 

-0.008 

0.160 

-0.007 

0.160 

-0.011 

0.159 

-0.012 

0.158 



5p 

-0.025 

0.115 

-0.025 

0.116 

-0.022 

0.114 

-0.023 

0.114 


0.5 

fipO 

-0.001 

0.199 

-0.001 

0.200 

0.003 

0.194 

0.002 

0.194 



(3pi 

0.003 

0.127 

0.003 

0.126 

0.000 

0.124 

-0.001 

0.123 



5p 

-0.003 

0.082 

-0.004 

0.082 

-0.002 

0.081 

-0.001 

0.081 

3 

0.1 

fipO 

0.025 

0.286 

0.025 

0.287 

0.024 

0.280 

0.022 

0.282 



Ppi 

-0.008 

0.179 

-0.009 

0.180 

-0.010 

0.182 

-0.010 

0.182 



5p 

-0.020 

0.119 

-0.021 

0.120 

-0.021 

0.118 

-0.021 

0.118 


0.5 

(3p0 

-0.056 

0.193 

-0.056 

0.192 

-0.056 

0.183 

-0.056 

0.184 



P pi 

0.013 

0.117 

0.012 

0.117 

0.012 

0.120 

0.011 

0.119 



5p 

-0.006 

0.072 

-0.006 

0.072 

-0.006 

0.072 

-0.006 

0.072 

4 

0.1 

(3p0 

0.053 

0.246 

0.054 

0.246 

0.063 

0.237 

0.063 

0.236 



(3pi 

0.002 

0.152 

0.002 

0.150 

0.004 

0.143 

0.003 

0.145 



5p 

-0.018 

0.102 

-0.017 

0.101 

-0.017 

0.097 

-0.016 

0.098 


0.5 

/ 3p0 

0.003 

0.202 

0.003 

0.203 

0.015 

0.193 

0.013 

0.191 



(3 pi 

-0.003 

0.140 

-0.002 

0.143 

-0.003 

0.130 

-0.002 

0.129 



5p 

-0.009 

0.096 

-0.010 

0.098 

-0.007 

0.086 

-0.007 

0.086 

5 

0.1 

(3p0 

0.011 

0.300 

0.010 

0.301 

-0.010 

0.296 

-0.014 

0.295 



(3 pi 

0.030 

0.200 

0.031 

0.199 

0.032 

0.207 

0.031 

0.206 



5p 

-0.050 

0.148 

-0.050 

0.147 

-0.054 

0.150 

-0.054 

0.150 


0.5 

f3po 

-0.046 

0.206 

-0.046 

0.206 

-0.065 

0.202 

-0.073 

0.206 



(3 pi 

0.052 

0.148 

0.053 

0.148 

0.053 

0.154 

0.055 

0.156 



Sp 

-0.042 

0.101 

-0.043 

0.102 

-0.046 

0.106 

-0.046 

0.107 


Next, the two alternative prior specifications for rj p , a, and O are considered to study the prior sen¬ 
sitivity. The first alternative specification considers the more diffuse priors given by rj p ~ A/0,25), 
a ~ ZQ{ 0.1, 0.1), and (j) ~ ZQ{ 0.01,0.01). The second alternative specification is the even more dif¬ 
fuse setting given by rj p ~ A/"(0,100), a ~ Zt?(0.001,0.001), and <f> ~ ZQ{ 0.001, 0.001). For ALDP and 
SNDP, the default base measures are used. For f3 p , 5 P , and 7 , we use the default specification. Table [3] 
presents the biases and RMSEs for AL, SN, ALDP, and SNDP under the five simulation settings for p = 0.1 
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and 0.5, showing that the result is robust with respect to the choice of hyperparameters. We also considered 
some different prior choices for ((3' p , d p )' and 7 , and obtained robust results. 


Table 3: Biases and RMSEs under the alternative priors for a, r, and r/ p 






AL 



SN 



ALDP 



SNDP 





Alternative 1 

Alternative 2 

Alternative 1 

Alternative 2 

Alternative 1 

Alternative 2 

Alternative 1 

Alternative 2 

Setting 

P 

Parameter 

Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 

Bias 

RMSE 

1 

0.1 

(3 p o 

0.049 

0.239 

0.050 

0.240 

0.050 

0.212 

0.050 

0.211 

0.050 

0.237 

0.049 

0.239 

0.051 

0.211 

0.051 

0.211 



Ppi 

-0.022 

0.140 

-0.021 

0.140 

-0.019 

0.135 

-0.019 

0.135 

-0.022 

0.139 

-0.022 

0.139 

-0.021 

0.135 

-0.020 

0.135 



Sp 

-0.009 

0.092 

-0.009 

0.092 

-0.008 

0.085 

-0.008 

0.085 

-0.009 

0.092 

-0.009 

0.093 

-0.008 

0.085 

-0.008 

0.085 


0.5 

(3 P o 

0.018 

0.180 

0.017 

0.181 

0.018 

0.166 

0.019 

0.168 

0.018 

0.183 

0.016 

0.180 

0.019 

0.165 

0.018 

0.165 



(3pi 

-0.001 

0.089 

-0.001 

0.089 

0.001 

0.087 

0.002 

0.087 

-0.001 

0.089 

-0.001 

0.089 

0.001 

0.087 

0.001 

0.087 



Sp 

-0.004 

0.063 

-0.004 

0.063 

-0.003 

0.061 

-0.003 

0.061 

-0.004 

0.063 

-0.004 

0.063 

-0.003 

0.061 

-0.003 

0.061 

2 

0.1 

PpO 

0.091 

0.304 

0.088 

0.304 

0.084 

0.312 

0.081 

0.311 

0.092 

0.305 

0.090 

0.307 

0.094 

0.300 

0.097 

0.305 



(3pi 

-0.008 

0.160 

-0.008 

0.160 

-0.008 

0.164 

-0.008 

0.164 

-0.007 

0.159 

-0.007 

0.160 

-0.010 

0.159 

-0.010 

0.159 



Sp 

-0.025 

0.116 

-0.025 

0.115 

-0.024 

0.116 

-0.023 

0.115 

-0.025 

0.115 

-0.025 

0.116 

-0.023 

0.113 

-0.024 

0.114 


0.5 

(3 p o 

-0.001 

0.198 

-0.001 

0.199 

-0.011 

0.207 

-0.011 

0.209 

0.002 

0.199 

0.001 

0.200 

0.005 

0.196 

0.005 

0.195 



Ppi 

0.003 

0.126 

0.003 

0.126 

0.003 

0.130 

0.002 

0.129 

0.003 

0.126 

0.003 

0.127 

-0.000 

0.123 

0.000 

0.123 



Sp 

-0.004 

0.082 

-0.004 

0.082 

-0.002 

0.084 

-0.002 

0.083 

-0.004 

0.082 

-0.004 

0.082 

-0.002 

0.081 

-0.002 

0.081 

3 

0.1 

(3pO 

0.027 

0.284 

0.028 

0.287 

-0.004 

0.301 

-0.006 

0.301 

0.028 

0.286 

0.027 

0.287 

0.024 

0.282 

0.026 

0.282 



(3pi 

-0.008 

0.180 

-0.007 

0.180 

-0.007 

0.182 

-0.007 

0.182 

-0.008 

0.180 

-0.008 

0.180 

-0.009 

0.181 

-0.009 

0.182 



Sp 

-0.022 

0.120 

-0.022 

0.120 

-0.022 

0.118 

-0.022 

0.118 

-0.021 

0.120 

-0.022 

0.120 

-0.021 

0.119 

-0.021 

0.118 


0.5 

(3p0 

-0.054 

0.191 

-0.053 

0.191 

-0.083 

0.223 

-0.084 

0.223 

-0.054 

0.193 

-0.056 

0.192 

-0.055 

0.182 

-0.054 

0.183 



(3pi 

0.013 

0.117 

0.014 

0.117 

0.014 

0.122 

0.015 

0.123 

0.013 

0.117 

0.013 

0.117 

0.012 

0.119 

0.012 

0.120 



Sp 

-0.006 

0.072 

-0.006 

0.072 

-0.008 

0.071 

-0.007 

0.071 

-0.007 

0.072 

-0.006 

0.072 

-0.006 

0.072 

-0.007 

0.072 

4 

0.1 

(3p0 

0.056 

0.245 

0.055 

0.246 

0.066 

0.232 

0.066 

0.231 

0.054 

0.245 

0.054 

0.247 

0.063 

0.235 

0.063 

0.233 



(3pi 

0.004 

0.152 

0.002 

0.151 

0.005 

0.142 

0.006 

0.143 

0.003 

0.151 

0.003 

0.150 

0.004 

0.143 

0.005 

0.145 



Sp 

-0.019 

0.103 

-0.018 

0.102 

-0.019 

0.100 

-0.020 

0.101 

-0.018 

0.102 

-0.018 

0.101 

-0.017 

0.098 

-0.018 

0.099 


0.5 

(3p0 

0.006 

0.198 

0.007 

0.200 

0.017 

0.186 

0.020 

0.189 

0.006 

0.202 

0.005 

0.202 

0.017 

0.191 

0.016 

0.191 



(3pi 

-0.003 

0.141 

-0.003 

0.142 

0.001 

0.129 

0.001 

0.129 

-0.002 

0.143 

-0.002 

0.143 

-0.001 

0.131 

-0.002 

0.129 



Sp 

-0.011 

0.097 

-0.010 

0.097 

-0.011 

0.089 

-0.011 

0.089 

-0.011 

0.099 

-0.010 

0.097 

-0.009 

0.087 

-0.008 

0.085 

5 

0.1 

(3p0 

0.017 

0.299 

0.017 

0.301 

-0.029 

0.342 

-0.028 

0.340 

0.018 

0.302 

0.014 

0.300 

-0.003 

0.296 

-0.003 

0.294 



(3pi 

0.030 

0.199 

0.030 

0.201 

0.045 

0.223 

0.045 

0.224 

0.031 

0.200 

0.030 

0.198 

0.032 

0.207 

0.032 

0.207 



Sp 

-0.050 

0.147 

-0.050 

0.148 

-0.067 

0.161 

-0.068 

0.164 

-0.051 

0.149 

-0.050 

0.148 

-0.053 

0.149 

-0.052 

0.148 


0.5 

(3p0 

-0.041 

0.199 

-0.041 

0.201 

-0.085 

0.256 

-0.087 

0.256 

-0.043 

0.206 

-0.042 

0.206 

-0.059 

0.199 

-0.061 

0.201 



(3pi 

0.053 

0.148 

0.053 

0.149 

0.069 

0.181 

0.070 

0.182 

0.053 

0.148 

0.053 

0.148 

0.054 

0.155 

0.054 

0.155 



Sp 

-0.042 

0.102 

-0.042 

0.102 

-0.059 

0.123 

-0.059 

0.124 

-0.042 

0.102 

-0.043 

0.103 

-0.045 

0.106 

-0.046 

0.107 


These findings thus confirm the robustness of the results with respect to the choice of base measures and 
prior distributions provided that a valid instrument is available. In the context of mean regression models, 
however, when the instrument is weak, the posterior distribution is known to exhibit sharp behaviour in the 
vicinity of non-identifiability (Hoogerheide et al., 2007b) and the posterior distribution is greatly affected 
by the prior specification ( e.g. Lopes and Poison, 2014). 

Here, we illustrate the behaviour of the posterior distribution by using a weak instrument. The data are 
generated from © without the regressor: 


y* = 5di + r}Vi + e*, 
di = 7 Wi + 


(16) 


for i = 1 ,..., 300, where 7 = 0 . 1 , (6, rf) = ( 1 , 0 . 6 ), W{ ~ 7V(0, 1 ), Vi ~ 7V(0, 1 ), and e* ~ AA( 0,1 — rj 2 ). 

The AL and SN models are estimated for p = 0.1 by running the MCMC for 20000 iterations and discarding 
the first 5000 draws as the burn-in period under the three prior specifications previously considered. 
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Figure |2]pi'esents the joint posterior distribution of (5,7) and (8, rj) for AL and SN under the three prior 
specifications and shows that the posterior distribution is greatly affected by the prior specification. The 
posterior distribution of <5 becomes more diffuse as 7 approaches zero. This trend becomes more profound 
as we use more diffuse prior distributions, producing star shapes. The figure also suggests that the prior 
distribution can act as an informative prior about the linear relationship between 8 and 7. Similar results 
were also obtained under different prior specifications for f3 p , 8 P , and 7 as well as for ALDP and SNDP. 


default 


default 


alternative 1 


alternative 2 



default 


default 


alternative 1 


alternative 2 



Figure 2: Joint posterior of (8, 7) and (5, //) for AL (top row) and SN (bottom row) 


5 Application: Labour Force Participation of Married Women 

The proposed endogenous models are applied to the dataset on the labour supply of married women of 
Mroz (1987). The dataset includes observations on 753 individuals. The response variable is the total 
number of hours in every 100 hours the wife worked for a wage outside the home during 1975. In the 
data, 325 of the 753 women worked zero hours and the corresponding responses are treated as left censored 
at zero. Hence, the censoring rate is approximately 0.43. The regressors of our model include years of 
education ( educ ), years of experience ( exper ) and its square ( expersq ), age of the wife (age), number of 
children under 6 years old ( kidslt6 ), number of children equal to or greater than 6 years old ( kidsge6 ), and 
non-wife household income ( nwifeinc ). We treat nwifeinc as an endogenous variable because it may be 
correlated with the unobserved household preference for the labour force participation of the wife. As an 
instrument, we include the years of education of the husband ( huseduc ), since this can influence both his 
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income and the non-wife household income, but it should not influence the decision of the wife to participate 
in the labour force. Smith and Blundell (1986) considered a similar setting where non-wife income was 
considered to be endogenous and the education of the husband was employed as the instrumental variable. 
They applied the endogenous Tobit model to data derived from the 1981 Family Expenditure Survey in the 
United Kingdom. 

Using the default prior specifications, the ALDP and SNDP models are estimated for p = 0.05,0.1,... , 0.95 
by running the MCMC for 30000 iterations and discarding the first 10000 draws as the burn-in period. Con¬ 
vergence is monitored by using the trace plots and Gelman-Rubin statistic for two chains with widespread 
starting values (Gelman et al., 2014). The upper bounds of the Gelman-Rubin confidence intervals for the 
selected parameters, /3 P: educ, S p , rj p , 'yhuseduc . 7 age, and a, for SNDP in the case of p = 0.1 are 1.01, 1.01, 
1.01, 1.00, 1.00, and 1.06 , respectively. Figure [3] presents the post burn-in trace plots for these parameters 
and shows the evidence of convergence of the chains. 



0 5000 10000 15000 20000 



0 5000 10000 15000 20000 


iteration 


iteration 




0 5000 10000 15000 20000 


iteration 


iteration 



Figure 3: Post burn-in trace plots for SNDP for p = 0.1 

First, we present the results for the representative quantiles, p = 0.1, 0.5, and 0.9. Table [4] shows the 
posterior means, 95% credible intervals, and inefficiency factors for AFDP and SNDP for these quantiles. 
The table shows that the sampling algorithm worked efficiently as the inefficiency factors are reasonably 
small. The posterior means for the instrument, huseduc, are positive and the 95% credible intervals do 
not include zero for all cases for both models, implying that huseduc is a valid instrument. For p = 0.5, 
the posterior means for rj p are 0.450 and 0.446 for AFDP and SNDP, respectively, and the 95% credible 
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intervals do not include zero. Therefore, it is suggested that non-wife income be treated as an endogenous 
variable for the median regression. 


Table 4: Posterior Summary for Female Labour Data 


p Parameter 


ALDP 


SNDP 


Mean 

95% Cl 

IF Mean 

95% Cl 

IF 

0.1 (3 p constant 

-4.205 

[-10.758, 

2.430] 

13.0 -4.340 

[-11.121, 

2.288] 

18.2 

educ 

1.126 

[ 0.656, 

1.599] 

12.7 1.117 

[ 0.659, 

1.614] 

47.7 

age 

-0.436 

[ -0.565, 

-0.311] 

25.1 -0.424 

[ -0.554, 

-0.293] 

37.0 

exper 

1.070 

[ 0.731, 

1.437] 

41.3 1.051 

[ 0.723, 

1.387] 

92.3 

expersq 

-0.019 

[ -0.030. 

-0.009] 

24.7 -0.019 

[ -0.029, 

-0.009] 

52.4 

kidslt6 

-8.346 

[-11.145, 

-5.949] 

80.4 -8.296 

[-10.948, 

-5.861] 

65.6 

kigsge6 

0.068 

[ -0.487, 

0.534] 

31.7 0.045 

[ -0.528, 

0.512] 

14.4 

S p nwifeinc 

-0.284 

[ -0.584, 

0.010] 

14.1 -0.279 

[ -0.577, 

0.007] 

33.4 

T)p 

0.176 

[ -0.117, 

0.473] 

11.5 0.171 

[ -0.125, 

0.472] 

27.8 

”7 constant 

-10.117 

[-14.486, 

-5.490] 

9.4 -10.609 

[-15.023, 

-6.112] 

11.3 

huseduc 

1.013 

[ 0.771, 

1.239] 

9.7 1.037 

[ 0.812, 

1.257] 

4.9 

educ 

0.272 

[ 0.018, 

0.551] 

6.6 0.286 

[ 0.018, 

0.562] 

5.7 

age 

0.210 

[ 0.140, 

0.280] 

6.7 0.221 

[ 0.152, 

0.290] 

7.6 

exper 

-0.090 

[ -0.269, 

0.084] 

12.2 -0.122 

[ -0.301, 

0.052] 

8.5 

expersq 

-0.003 

[ -0.009, 

0.003] 

12.2 -0.002 

[ -0.008, 

0.003] 

4.7 

kidslt6 

-0.554 

[ -1.424, 

0.351] 

6.4 -0.472 

[ -1.430, 

0.469] 

5.5 

kigsge6 

0.481 

[ 0.125, 

0.838] 

6.9 0.464 

[ 0.080, 

0.839] 

7.7 

a 

0.250 

[ 0.211, 

0.298] 

33.9 0.265 

[ 0.212, 

0.322] 

78.7 

0.5 /3 p constant 

8.571 

[ -0.899, 17.634] 

7.1 8.265 

[ -1.288, 17.473] 

9.1 

educ 

1.287 

[ 0.734, 

1.889] 

11.0 1.291 

[ 0.727, 

1.895] 

8.2 

age 

-0.510 

[ -0.680, 

-0.333] 

10.9 -0.502 

[ -0.670, 

-0.321] 

12.9 

exper 

1.398 

[ 1.029, 

1.787] 

12.0 1.391 

[ 1.021, 

1.777] 

12.1 

expersq 

-0.021 

[ -0.034, 

-0.009] 

13.4 -0.021 

[ -0.034, 

-0.009] 

13.3 

kidslt6 

-9.546 

[-11.975, 

-7.305] 

14.5 -9.441 

[-11.849, 

-7.123] 

5.2 

kigsge6 

-0.255 

[ -1.116, 

0.620] 

10.9 -0.268 

[ -1.104, 

0.592] 

10.4 

S p nwifeinc 

-0.525 

[ -0.944, 

-0.159] 

15.5 -0.522 

[ -0.917, 

-0.165] 

8.5 

r)p 

0.450 

[ 0.079, 

0.885] 

14.0 0.446 

[ 0.087, 

0.852] 

7.9 

7 constant 

-10.318 

[-14.784, 

-5.689] 

12.3 -11.021 

[-15.556, 

-6.377] 

11.2 

huseduc 

1.013 

[ 0.768, 

1.242] 

9.7 1.032 

[ 0.809, 

1.251] 

7.7 

educ 

0.277 

[ 0.025, 

0.557] 

5.8 0.301 

[ 0.028, 

0.583] 

5.5 

age 

0.212 

[ 0.142, 

0.283] 

8.2 0.226 

[ 0.156, 

0.296] 

11.5 

exper 

-0.090 

[ -0.274, 

0.084] 

4.3 -0.120 

[ -0.298, 

0.054] 

9.4 

expersq 

-0.003 

[ -0.009, 

0.003] 

4.4 -0.002 

[ -0.008, 

0.003] 

8.4 

kidslt6 

-0.536 

[ -1.408, 

0.362] 

2.9 -0.447 

[ -1.413, 

0.515] 

3.1 

kigsge6 

0.491 

[ 0.136, 

0.850] 

2.7 0.468 

[ 0.082, 

0.863] 

4.9 

a 

0.250 

[ 0.212, 

0.297] 

18.6 0.263 

[ 0.215, 

0.315] 

77.0 

0.9 /3 p constant 

17.077 

[ 9.225, 25.430] 

7.5 16.957 

[ 8.985, 25.429] 

3.2 

educ 

0.405 

[ -0.107, 

0.905] 

7.9 0.420 

[ -0.102, 

0.921] 

2.1 

age 

-0.266 

[ -0.424, 

-0.112] 

6.7 -0.265 

[ -0.419, 

-0.113] 

2.7 

exper 

1.075 

[ 0.749, 

1.387] 

13.1 1.072 

[ 0.747, 

1.389] 

12.0 

expersq 

-0.018 

[ -0.026, 

-0.010] 

10.8 -0.018 

[ -0.026, 

-0.010] 

8.8 

kidslt6 

-6.014 

[ -8.373, 

-3.553] 

7.9 -6.085 

[ -8.476, 

-3.584] 

8.3 

kigsge6 

0.254 

[ -0.490, 

0.978] 

5.7 0.254 

[ -0.492, 

1.009] 

9.9 

S p nwifeinc 

-0.043 

[ -0.384, 

0.288] 

4.5 -0.050 

[ -0.380, 

0.275] 

6.7 

Vp 

-0.002 

[ -0.340, 

0.339] 

5.0 0.004 

[ -0.328, 

0.337] 

6.8 

7 constant 

-10.174 

[-14.698, 

-5.486] 

6.1 -10.741 

[-15.390, 

-5.946] 

16.1 

huseduc 

1.013 

[ 0.773, 

1.240] 

9.2 1.036 

[ 0.812, 

1.253] 

6.3 

educ 

0.274 

[ 0.021, 

0.551] 

9.5 0.292 

[ 0.017, 

0.582] 

9.5 

age 

0.211 

[ 0.138, 

0.283] 

4.2 0.223 

[ 0.151, 

0.294] 

14.0 

exper 

-0.092 

[ -0.272, 

0.085] 

5.8 -0.126 

[ -0.300, 

0.048] 

6.9 

expersq 

-0.003 

[ -0.009, 

0.003] 

7.0 -0.002 

[ -0.008, 

0.003] 

6.0 

kidslt6 

-0.550 

[ -1.435, 

0.349] 

7.2 -0.483 

[ -1.450, 

0.500] 

9.5 

kigsge6 

0.483 

[ 0.128, 

0.837] 

4.3 0.464 

[ 0.076, 

0.857] 

4.2 

a 

0.251 

[ 0.213, 

0.291] 

34.5 0.265 

[ 0.213, 

0.321] 

66.1 


To study the endogeneity in non-wife household income across quantiles, the posterior distributions of rj p 
arc presented. The results across the quantiles can be best understood by plotting the posterior distributions 


20 



as a function of p. Figure 0] shows the posterior means and 95% credible intervals of ?/ ;) for ALDP and 
SNDP for p = 0.05,0.1,, 0.95. The figure shows that the two models produced similar results and that 
the posterior distributions of rj p are concentrated away from zero for the mid quantiles. Specifically, for 
0.2 < p < 0.65, the 95% credible intervals do not include zero for either model. There are notable peaks 
around p = 0.35, where the posterior means of p p under the default prior specifications are 0.664 and 0.662 
with the 95% credible intervals (0.201,1.137) and (0.230,1.124) for ALDP and SNDP, respectively. This 
is an interesting result considering that the censoring rate is 0.43. The result implies that the effect of the 
endogeneity of non-wife income is the most profound when the wife is about to decide whether to enter the 
labour force. When the opportunity cost of labour supply is very high (lower quantile) or the wife works 
on a more regular basis (higher quantile), such endogeneity diminishes. Smith and Blundell (1986) also 
reported that non-wife income is endogenous by using the endogenous Tobit regression model. The mean 
of our dataset is 7.399, which approximately corresponds to the 0.6-th quantile. For p = 0.6, the posterior 
mean of rj p for ALDP is 0.428 with the 95% credible interval (0.036, 0.832) and that for SNDP is 0.421 
with the 95% credible interval (0.037,0.832). The figure also shows the posterior means and 95% credible 
intervals under the two alternative prior specifications considered in Section l4~3l confirming that our results 
arc robust with respect to the prior specifications. 


ALDP 


SNDP 




P 


P 


Figure 4: Posterior means and 95% credible intervals of r] p under the default and alternative priors for 
p = 0.05,0.1,..., 0.95 

Figure [5] compares the posterior means and 95% credible intervals of (fi p , 5 p )' for SNDP, ALDP, and 
TQR for p = 0.05,0.1,... , 0.95. The results for SNDP and ALDP are quite similar. The figure clearly 
shows that the posterior distributions for the key variable, nwifeinc, for the proposed models and TQR 
exhibit some differences for 0.2 < p < 0.65, where nwifeinc is indicated to be endogenous. The difference 
becomes the most profound around p = 0.35 for which the posterior mean for nwifeinc is —0.761 for ALDP, 
—0.756 for SNDP, and —0.147 for TQR, implying a stronger effect of non-wife income when endogeneity 
is taken into account. The posterior distributions for nwifeinc for ALDP and SNDP are more dispersed than 
that for TQR for all p. While the 95% credible intervals include zero for all models for the upper quantiles, 
for the lower quantiles, such as p = 0.1, those for ALDP and SNDP include zero and those for TQR do not. 

Differences in the results arc also observed for other variables. For p = 0.35, the posterior means for 
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educ and age arc respectively 1.689 and —0.513 for ALDP, 1.705 and —0.504 for SNDP, and 1.064 and 
—0.606 for TQR. For the upper quantiles, p > 0.85, the 95% credible intervals for educ include zero for 
the proposed models, while those for TQR do not, implying that an additional year of education does not 
increase the working hours for those quantiles when the endogeneity from non-wife income is taken into 
account. For expersq, the endogenous models result in slightly more dispersed posterior distributions for 
0.2 < p < 0.7. The posterior means for p = 0.35 arc —0.021, —0.020, and —0.016 for ALDP, SNDP, 
and TQR, respectively. For kidsge6 , the posterior means for p = 0.35 arc —0.274, —0.262, and —0.475 
for ALDP, SNDP, and TQR, respectively. However, the 95% credible intervals include zero for all p for all 
models. On the other hand, the figure also shows that the models produced similar results for exper and 
kidslt6 for all p. 
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Figure 5: Posterior means and 95% credible intervals of (/3' p ,5 p )' for ALDP, SNDP, and TQR for p = 
0.05,0.1,..., 0.95 

6 Conclusion 

We proposed Bayesian endogenous TQR models using parametric and semiparametric first stage regression 
models built around the zero a-th quantile assumption. The value of a determines the quantile level of the 
mode of the error distribution and is estimated from the data. From the simulation study, the AL, ALDP, and 
SNDP models worked relatively well for the various situations, while they faced the same limitation pointed 
out by Kottas and Krnjajic (2011). On the other hand, the SN model could not accommodate the fat tailed 
first stage errors. Although AEP could be a promising model in terms of flexibility, the inefficiency of the 
MCMC algorithm largely limits its applicability in practice. The development of a more convenient mixture 
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representation for the AEP distribution is thus required. From application to data on the labour supply of 
married women, the effect of the endogeneity in non-wife income was found to be the most profound for 
the quantile level close to the censoring rate. For this quantile, some differences in the parameter estimates 
between the endogenous and standard models were found, such as the stronger effect of non-wife income 
on working hours. 

This study only considered the case of continuous endogenous variables. We arc also interested in in¬ 
corporating endogenous binary variables into a Bayesian quantile regression model. An important extension 
might therefore be addressing multiple endogenous dummy variables to represent selection among multiple 
alternatives, such as the choice of a hospital and insurance plan, as considered in Geweke et al. (2003) and 
Deb et al. (2006). However, such an extension would be challenging with respect to the assumptions that 
must be imposed on the multivariate error terms. We leave these issues to future research. 
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